Fig. 3.23. An illustration of the introduction of the momentum term.
der to improve the learning quality of a MLP model, the above
pdate rule has been altered by introducing the momentum term
art, et al., 1986], which is based on the update of the model
rs in the previous learning cycle. The momentum term can
y prevent over-update of model parameters. The use of the
m term is shown below, where 0 ൏ߙ൏1 is called the
m factor,
Δܟ௧ାଵൌെߟߝ௧
ܟ௧ߙΔܟ௧
(3.36)
above equation, Δܟ௧ stands for the update of w at the learning
nd Δܟ௧ାଵ stands for the update of w at the learning cycle t + 1. It
een that the update term ߟߝ௧ܟ௧
⁄
and the momentum term
ve different signs, hence different directions. Whenever the term
ܟ௧ goes too far, the term ߙΔܟ௧ will pull the move backward
Therefore the momentum term can reduce the oscillation
y and prevent the potential move from a wrong direction so as to
saddle point on the error function curve. As shown in Figure 3.23,
move from ݔ to ݔଷ, which will make the move to the saddle point
other advanced approach for improving the learning capability is
f the second order derivative, such as the Hessian matrix [Bishop,